# Multi-task Visual Understanding
PE Spatial G14 448
Apache-2.0
The Perception Encoder (PE) is a state-of-the-art image and video understanding encoder trained through simple vision-language learning.
P
facebook
3,256
16
Florence 2 Base
MIT
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle a wide range of vision and vision-language tasks.
Text-to-Image
Transformers

F
microsoft
316.74k
264
Featured Recommended AI Models